GH-46270: [C++][Parquet] Clarify GeoStatistics docstring#46649
GH-46270: [C++][Parquet] Clarify GeoStatistics docstring#46649paleolimbot merged 1 commit intoapache:mainfrom
Conversation
|
@paleolimbot I'm curious about this in the method docstrings: /// For statistics read from a Parquet file, dimension_empty() will always contain
/// false values because there is no mechanism to communicate an empty interval
/// in the Thrift metadata.Why was it done that way, if emptiness is a useful information to have? And is there a point in exposing emptiness in our geostats APIs? Usually, people want to filter from a Parquet file read from disk, not one that is being constructed in-memory... |
The PR where we discussed this is apache/parquet-format#494 ...the consensus was that checking the
We use the same API for producing and consuming GeoStatistics (this was modelled after the regular Statistics). We could move the write path only use internals although I am not sure this would be less confusing. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 8d44eea. There were 68 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
The distinction between "invalid" and "empty" is not clear in the current documentation!
What changes are included in this PR?
The docstring for GeoStatistics was improved.
Are these changes tested?
Just documention!
Are there any user-facing changes?
No
GeoStatistics::dimension_emptydocstring #46270